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Abstract. Dynamically typed object-oriented languages enable program¬ 
mers to write elegant, reusable and extensible programs. However, with 
the current methodology for program verification, the absence of static 
type information creates signihcant overhead. Our proposal is two-fold: 
First, we propose a layer of abstraction hiding the complexity of dy¬ 
namic typing when provided with snfScient type information. Since this 
essentially creates the illusion of verifying a statically-typed program, 
the effort reqnired is eqnivalent to the statically-typed case. 

Second, we show how the required type information can be efficiently 
derived for all type-safe programs by integrating a type inference al¬ 
gorithm into Hoare logic, yielding a semi-automatic procedure allowing 
the user to focus on those typing problems really requiring his attention. 
While applying type inference to dynamically typed programs is a well- 
established method by now, our approach complements conventional soft 
typing systems by offering formal proof as a third option besides modi¬ 
fying the program (static typing) and accepting the presence of runtime 
type errors (dynamic typing). 


1 Introduction 

Dynamically typed programming languages refrain from restricting their pro¬ 
grams to ensure operations are only applied to suitable operands. While this 
allows experienced programmers to write more elegant, concise and reusable 
code, it has the obvious drawback that type errors may occur at runtime. 

Recently, object-oriented dynamically typed languages like Python, Ruby 
and JavaScript are gaining popularity also on the server-side (Ruby on Rails, 
node.js) and are used even for business- [27] and safety-critical [3] applications. 

Unfortunately, despite the growing need for correctness guarantees, the lack 
of type information causes a large overhead in formal methods like Hoare logic 
and severely decreases the effectiveness of automatic reasoning engines compared 
to the statically-typed setting (see Section 2.1). 
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There are two ways to deal with this problem: 

1) Annotation: Most contemporary approaches to verifying dynamically 
typed programs ask the user to manually supply the needed type information 
in loop invariants and method contracts [12,23,29,21], For larger programs, this 
induces significant overhead. We argue that manually supplying type information 
for all variables is not only tedious, but also often unnecessary, as most of this 
information could have been inferred automatically. 

2) Translation: Obviously, translating the dynamically typed program into 
an equivalent statically typed version^ and then using a Hoare logic for statically- 
typed programs (like [2,5]) for verification is also possible. In such a translation 
process, type inference algorithms like [17,11] are usually of significant help. 
Note, however, that gradual typing [26,4] it not useful in this context, as such 
Hoare logics require the the entire program to be well-typed prior to verification. 
Additionally, this approach removes any benefit of dynamic typing since it is 
equivalent to verifying a statically typed language with type inference. 

We propose to get the best of both worlds by integrating an automatic type 
safety verifier with Hoare logic into a semi-automatic procedure and using the 
derived type information to reduce overhead and enable effective automated 
reasoning about dynamically typed programs just like with statically typed ones. 
In the context of soft typing [6] , our approach can also be understood as offering 
proofs of type safety as a third option besides rewriting the program (static 
typing) and runtime-checks (dynamic typing). 

Concretely, in this paper we describe two components: 

1) A layer of abstraction that, given suitable type information, abstracts 
from the complexities of dynamic typing and hence reduces the verification of 
dynamically typed programs to that of statically typed ones. This also works 
with partial type information on a per-expression basis (see Section 2.1). 

2) A construction for complementing a Hoare logic with an automatic type 
safety verifier, yielding a semi-automatic procedure for deriving type information 
with the following properties (see Section 2.4): 

— Automation - only typing problems beyond the reach of the automatic ver¬ 
ifier require manual intervention. 

— Completeness relative to the Hoare logic - if the Hoare logic is complete, then 
type information can be derived for all typesafe programs (see Section 5.2). 

— Bidirectional exchange of results - automatically derived type information 
can be used in Hoare logic proofs and vice versa, proof results are used by 
the automatic verifier to increase precision. 

Together, these two components form a novel verification system that makes 
the effort additionally required to verify a dynamically typed program propor¬ 
tional to the total complexity of hard typing problems in this program. Unlike 

^ After this translation, the static type system should be able to ensure the absence of 
type errors, unlike in the embeddings discussed in [15]. Finding such an equivalent 
version is undecidable in general and hence requires manual effort (see Section 2.2) 
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in annotation-based-approaches, programs with only trivial typing problems re¬ 
quire no additional effort and unlike in translation-based-approaches, all typesafe 
programs can be verified. 

This paper constitutes our first step towards connecting the (relative) com¬ 
plete Hoare logics [2,5] and advanced reasoning engines developed for statically- 
typed object-oriented languages with the advancing automatic type safety ver¬ 
ifiers for dynamically typed languages [21,29,9,17]. In this extended version, 
proofs for all theorems and lemmas can be found in Appendix E. 

Notation is a sequence pi,...,p„ where n is obvious from context or does 
not matter, {"^} the smallest set containing all its elements and it b sequence 
concatenation. = means “is defined as” and N„ = {0,..., n}, N)j = {1,..., n}. 

2 Overview / Motivation 

We will first discuss how correctness proofs can be simplified using sufficient 
type information and then how this information can be derived. 


2.1 Static- vs. Dynamically-typed Hoare Logic 

Apart from the additional need to establish type safety, there are other differ¬ 
ences between Hoare logic for dynamically typed- and statically typed languages 
[HLd and HLg ). The latter (like [2,5]), usually share a type system between 
programming- and assertion language: the assertion x > 8 denotes the set of 
states where the value of a numeric program variable x is larger than 8. In 
HLd (like [12]) however, as types are not statically known, all variables are of 
type O (object). The assertion x > 8 is hence meaningless as > is not defined for 
type O. In this setting, a similar set of states can be denoted by the assertion^ 
3i. N(x, i) A i > 8 which can be automatically derived from x > 8 given sufficient 
type information (the fact that the object referenced by x always represents a 
number). 

Furthermore, HLg usually include side-effect-free (pure) program expressions 
(e) into the assertion language, allowing efficient reasoning using proof rules like 

{q[u := e]}u := e{q} 

Here, g[u := e] denotes the substitution of all occurrences of a variable u by e 
in the assertion q. This rule allows directly deducing weakest preconditions over 
assignments like {x -|- 5 > 8}x := x -|- 5{x > 8} (1) by letting the expression e 
traverse the boundary between program and logic. In HLd , this is not possible 
since program expressions could have side-effects. While a subset of side-effect- 
free methods can be defined, identifying such pure expressions requires type 
information. Without it, establishing a property equivalent to (1) requires > 6 
rule applications. 

the precise meaning of N(x, i) will be explained in Subsection 4.2. 
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This observation is given significance by the fact that usually most expres¬ 
sions only involve immutable data types like numbers and strings. Regarding 
them as side-effecting operations on general object-structures not only com¬ 
plicates proofs, but also significantly decreases the effectiveness of automated 
reasoning engines . For instance, assertions can often be efficiently established 
by SMT solvers over Presburger arithmetic while no similar decision procedure 
exists for arbitrary operations over general object-structures. 

Section 4 will show how type information can be used to counter these prob¬ 
lems and create the illusion of proving a statically typed program. 

2.2 Providing Type Information 

Sufficient type information for dynamically typed programs is uncomputable in 
general (Section 5.1). However, a number of good approximations exist [17,11] 
that we will refer to as automatic type safety verifiers. 

It is known that many dynamically typed programs only occasionally diverge 
from what would also be possible in static typing disciplines^ and consequently, 
that the output of such algorithms is usually sufficient for typing most of their 
subexpressions [17, Section 5][11, Section 6]. 

If the entire program can be typed by a sound automatic verifier, then 
HLs could be applied. However, the whole point of dynamic typing is the pos¬ 
sibility to go beyond the limits of such automatic procedures (type systems). 
Approaches to verifying these languages thus must also be able to operate under 
less ideal circumstances. The following example will illustrate this point. 

2.3 The Evaluator Example 

Figure 1 depicts a dynamically typed program evaluating arithmetic expressions. 
While crafted to provide a hard typing problem, its use of ad-hoc data structures 
is not uncommon in Ruby, Python or Javascript. 

The class Evaluator has two methods parse () and calcO. The former 
parses a string and stores the resulting parse tree in the instance variable Otree, 
while the latter evaluates a given parse tree (defaulting to Otree) over a given 
environment (a mapping from variable names (strings) to integers). 

The example is hard to type because the parse trees are represented as ad- 
hoc constructions of nested lists. Numeric constants VALUE, VAR and OP in the 
first element distinguish value-, variable- and operation nodes. The types of 
the remaining list elements depend on these node types: the second element is 
numeric (the value) for value-nodes, a string (the variable name to be looked up 
in the environment) for var-nodes and numeric (representing the operation to 
be performed) for op-nodes. Only op-nodes use nesting: further list elements are 
sub-parse-trees that are to be recursively evaluated to operands. 

^ Advanced dynamic features like mixins, traits, method update and dynamic class 
hierarchies increase the complexity of type inference. However, in this paper we aim 
to study the problem of dynamic typing in isolation and leave them as future work. 
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class Evaluator {. 

method parse(str) {...}■ 

method calcCenv, tree = Otree) { 
if tree [0] = VALUE then tree[l] 
elseif tree [0] = VAR then env[tree[l]] 
elseif tree[0] = OP then 

if tree[l] = ADD then calc(env, tree [2]) + calc(env, tree [3]) 
elseif ... 
else nil 
fi 

else nil 
fi 

} 

} 

new Evaluator0.parse(input).calc(ENV) 

Fig. 1. Relevant part of the evaluator example source code 


Typing this example requires deducing precise types for heterogeneous lists 
from propositions (like tree [0] = VALUE) about their first element. To the best 
of our knowledge there is no automatic procedure able to establish such impli¬ 
cations. Also note that the typing problem can be made even harder: allowing 
an arbitrary number of operands in op-nodes, returning strings instead of null, 
etc. This example will be used to demonstrate our technique. 

2.4 Semi-Automation 




Fig. 2. Overview of the concept 


In the concept depicted in Figure 2, the correctness proof is split into two 
“layers” (see Section 6.3). While the user (supported by a theorem prover) de¬ 
rives his proof in the higher layer, the lower layer contains type information and 
is created and modified solely by the automatic type safety verifier. For this 
purpose, the typings (ty) derived for the program tt by the verifier are trans¬ 
lated into proofs (see Section 6.2). While the information contained in this lower 
layer proof is already useful for supporting the user’s higher-layer proof (see 
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Section 4), the user may at any time decide to refine it by deriving more pre¬ 
cise type information in the higher layer. This information is filtered to make it 
interpretable for the verifier and then supplied as trusted assumptions to refine 
the lower-layer type information (see Sections 6.1, 6.3). 

Note that deriving type information and using the layer of abstraction is not 
a strict 2-step process. The latter requires the former only on a per-expression 
basis, allowing an interleaving of steps. Concretely, the layer of abstraction ap¬ 
plies to all expressions proven type-safe (see Section 4). Expressions with open 
typing problems may be included at any time by proving them type-safe. This 
interleaving is possible as our refinements are monotonic (see Section 6.3). 

3 Setting 

3.1 Model Languages: Dyn and Stat 

To explain our methodology in a setting facilitating formal proof we introduce 
a pair of minimalistic programming languages that differ only in the fact that 
one is dynamically typed (dyn) while the other uses a static type system with 
type inference (stat). Like their real-world siblings, the two are imperative, 
class-based object-oriented languages including inheritance, method renaming, 
dynamic dispatch and constructors. However, they do not support advanced 
dynamic features like a dynamic class hierarchy, method update or eval(). 


Syntax of dyn: 


u G Vi, @x G V/, C G C, m G M 


ProQd 9 TT ::= class S 
Classd B class :.= class C < C {metk} 

Methd B meth method m(lt){5'} | rename m m, Stmtd B S 5"; 5 | e 

Expcd B e null | u | @x | this | e == e | e is_a? C | e.m(~<f) | new C(~if) 

I u := e I @x := e I if e then S else S fi | while e do S' od 
Syntax of stat: - coincides with dyn, except for 

Stmts B S ::= S; S | | u := ee.m(ej) | u := new C(et) | u := | @x := 

I if Cs then S else S fi | while do S od 

Expcs B Cs ::= null | u | @x | this | == Ce G Cnsta, op(et) G Ops 

I Cs is_a? C I Ce I op{et) \ if Ce then Ce else Ce fi 
Syntactic sugar in dyn: 


Cl © 62 = ei.m0(e2) 
if 6 then S fi = if e then S else null fi 
false = new bool(null), true = false.not{) 
0 = new num(null), n = {n— l).succ() 


Basic data types in stat: 

true, false G Cnsta, : B B G Opa 

A, V,-s-: B X B i-s- B G Opa 

0,1,... G Cnsta, =: N X N !->■ B G Ops 

+, *, div : N X N !->■ N G Ops 


Fig. 3. Syntax of dyn and stat 


Syntax: The syntax of both dyn and stat is depicted in Figure 3. In dyn, 
method bodies consist of statements (S) which in contrast to expressions (e) 
can contain sequential composition. Expressions are composed of null, the only 








7 


constant, local- and instance variables (prefixed with @), the self-reference this, 
operators for object identity and dynamic type checks, method- and construc¬ 
tor calls, assignments, conditionals and while loops. Note that equality (=) is 
desugared to a (class-specific) method call, while object identity (==) is a build- 
in operation yielding true iff the two expressions refer to the same object (We 
stipulate null == null yields true). 

Each class except the predefined class object must specify a parent class 
whose methods are inherited. The inheritance relation must be acyclic. Every 
class thus transitively inherits from object. Inherited methods may be overwrit¬ 
ten or renamed (using rename). Like in actual dynamically typed languages, 
inheritance is mere code reuse and can be removed using an automatic expan¬ 
sion step [22]. Furthermore, we will assume this step to be completed and not 
concern ourselves any further with inheritance or renaming. 

Semantics: Both dyn and stat programs consist of a main statement S and 
sets of classes C, methods A4 and variables V = Vl l±) V/ where Vl and V/ are 
the sets of local- and instance variables respectively. While each class C € C 
has a subset of method declarations Afc C At and instance variables Vc C V/, 
every method C.m S A4 has a subset of local variables Vc.m C Vl used in its 
method body 5'c.m- Vs = {this,v} C Vs is a set of special variables. While this 
references the current object and is not allowed to be assigned to in programs, r 
holds the result of the last evaluated expression and cannot be used in programs. 

Dyn’s value domain is the set of all objects Vd = Vo and its type system 
is the lattice of union types represented as sets of class names {Ci,...,C„} G 
7^ = 2^ with the subset-ordering C (see Figure 4). The null value is con¬ 
tained in every such type. Stat on the contrary distinguishes basic data types 
Ts = {0,N,B,S,L,M,...} and its value domain Vs = 1+Jt^t includes objects, 
numbers, booleans, strings, lists and finite maps. We omit definitions of states, 
state update etc. as they are standard. To keep track of instance-class relation¬ 
ships we use class references and for every class C introduce a distinct object pc 
as well as a special instance variable @c such that o.@c = pc iff o is an instance 
of class C. Using @c in programs is not permitted. 



Fig. 4. Type lattices of dyn (left) and stat (right) 


Comparing Dyn with Stat: Dyn is a pure object-oriented language (objects 
are the only values) while stat has basic data types. However, both provide the 
same constants and pure (i.e. side-effect-free) operations on them. Dyn desugars 



them to constructor and method calls (see Figure 3), while stat (like usual in 
statically typed languages) provides them build-in (c^ and op{e^) in Figure 3). 

Also, stat expressions are pure. Side-effects are only allowed in statements, 
which must only have pure subexpressions. This is not a restriction, as every 
dyn-expression can be transformed into a sequence of stat statements by recur¬ 
sively (and in the order of evaluation) replacing subexpressions e by fresh local 
variables u and prepending the assignment u := e;. 

Every stat program is also a dyn program that evaluates to (an object- 
oriented version of) the same result. The only reason that the opposite direction 
does not hold is the language restriction imposed by stat’s static type system. 
Type Errors: Contrary to stat, which rejects programs deemed unsafe at com¬ 
pile time, dyn allows every syntactically correct program to be executed and 
raises type errors at runtime when 

— a method call is not supported by its receiver (in this arity) or 

— a condition of a conditional or while loop is not boolean 

While “message not understood”-errors are fundamentally linked to type-checking 
in class-based 00-languages, dynamically typed languages often allow condi¬ 
tions to be of arbitrary type. Nevertheless, the second error condition models a 
common error class where a built-in operation supports a fixed set of types. 

Many dynamically typed languages raise type errors when accessing vari¬ 
ables prior to assignment. We will leave this as future work and consider all 
local (instance) variables to be initialized to null prior to method executions 
(on instantiation). Also, type errors are often treated as exceptions, allowing 
interception and handling. For simplicity, we will consider them as fatal. 


3.2 Hoare Logic 

The presentation of dyn and stat’s program logics closely follows [2,1]. We start 
by introducing the assertion language (Figure 5). Essentially, it is weak second 
order logic, extended with the same constants Cg, operations op{ I ) and types 
used in stat. It will be used to reason about both dyn and stat, however. 

Assertions contain typed logical expressions (1). Such expressions consist of 
typed logical variables, local/instance program variables n/l.@x (of type O in 
dyn and of some type T G 7^ in stat / same, with I being of type O) including 
this, typed constants and typed operations. Contrary to program expressions, 
logical expressions can access instance variables of objects other than this. 

Logical expressions may only occur as parts of well-typed equations. Fol¬ 
lowing [5], undefined operations like dereferencing a null value or accessing a 
sequence with an index out of bounds (/[n] with n > jij) yield a null value and 
equality is non-strict with respect to such values {null = null is true) to ensure a 
two-valued logic. Assertions are boolean combinations of such equations allowing 
quantification over finite sequences of elements of basic types. 

We also introduce the following abbreviation for making reasoning about 
runtime types more convenient: 
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Asrt Bp,q ::= H € T, T S T 

PI e {Ci,...,C„} = l^null^[l.@c = pc,y ...ylMc = pcJ 
The reader may convince himself/herself that the following implications hold: 
PleTiApl GT2^P1 eTiHT^ pi^T^plGT\r 
PI e Ti V PI e T2 ^ PI G Ti UT 2 h = h ^ 3r. pil g ta P 2 I g t 
Selected differences between the Hoare-style axiomatic semantics for dyn and 
stat are contrasted in Figure 6. While the semantics for stat are standard^, the 
rules for dyn were modeled after [12]. Omitted rules are listed in Appendix B. In 
Hoare triples {pjS'lg}, the special variable r is only allowed in the postcondition 
q and denotes the return value of S. The rules will be analyzed in the next 
section. 


4 Layer of Abstraction 

Let us compare the proof rules given in Figure 6. Obviously, the dyn rules are 
more complicated than their stat counterparts. Analyzing their differences, one 
can identify three core reasons why reasoning about dynamically typed programs 
is more complex than reasoning about statically typed ones. 

1. Type safety: In Figure 6, the parts ensuring type safety are (marked) . Such 
type safety preconditions are unnecessary in statically typed languages. 

2. Mapping objects to values: Hoare logic for dynamically typed languages 
often uses predicates to map between program objects and logical values. For 
instance, the COND rule has to use the predicate B() to establish a correspon¬ 
dence between the program expressions e and the logical expression b of type B. 
This additional layer of indirection not only reduces readability, but also hinders 
substitutions for pure expressions (see next paragraph). 

3. Side-effecting expressions: In the stat-rules ASGN and COND, pure pro¬ 
gram expressions and are directly used in logical assertions. Here, the clever 
design choice of a shared type system pays off. Unfortunately, dynamic typing 
forces us to relinquish this benefit, as the types of expressions are not statically 
known and impure expressions are ill-suited for logical reasoning. Observe also 
how dyn’s METH rule models the evaluation order using a sequence of interme¬ 
diate predicates Pi, which would not be necessary for pure expressions. However, 
since dyn treats operations as method calls, the METH rule needs to be applied 
even for pure operations like -I-, <, A, etc, making properties of assignments and 
conditionals even more tedious to derive. 

The following sections will explain how the layer of abstraction mitigates 
these issues. 

4.1 Type Safety Preconditions 

Like already mentioned, the fact that type errors are runtime events in dynam¬ 
ically typed languages gives rise to the following notion of correctness: 

they closely follow other Hoare logics for statically typed languages [2,5,1] 
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p,q £ Asrt I — I \ \ p f\p\3v-.T*. p T £ 71 

I £ LExp ::= v | u | IMx \ null \ this | if I then I else Z fi | Z = Z | |Z| | Z[Z] | | op{ I ) 

with the usual abbreviations: pWq = -^{-^pA^q), p —>■ g = -^pVq, p ■£>■ q = p ^ qAq —>■ p, 
Bn : T. p = 3v' : T*. |n'| = 1 A p[n/n'[0]], Vu : T. p = -i3n : T. -ip 

Fig. 5. Syntax of the assertion language 


Hoare logic rules for 


dyn 

RULE: Assignment (ASGN) 
{p}e{g[u := r]} 


stat 


{p}u := e{g} 

RULE: Conditional (COND) type-safej partial correctness) 


{p[u := es]}u := ee{p} 


{p}e{r A booLtest} 

{rAb}Si{q} {pAbe}gi{g} {pA^be}g 2 {g} 

{r A -iZ)}S 2 {g} Ip} if then Si else S 2 fi {g} 

{p} if e then Si else S 2 fi {g} 

where 6 is a predicate and booLtest = £ {foooZjj A B(r, b) 

RULE: Method Call (METH) 

{Pi}ei{Pi-i-i[u, := r]} for i £ N„ _ {p}u := Uo.m(ui, .■.,u„){g} 

{pn+i}uo.m(ui,Un,){g} {p[uo,...,u„ := eeo,---,e£„]} 

{po}eo.m(ei,e„){g} u := eeo-ni(eei,ee„){g} 

where u; £ Vl fresh, ui ^ var{ej) U change{ej) for all i,j £ N„. 

RULE: Recursion (REC) (dyn and stat) (^type-safe j partial correctness) 


^ 1“ W'S'I'?}, 

A h {pijbegin local this, ut := v', vt; Si end{gi}, i £ 
[pi —t |[v(I £ {Ci},f £ 

WSIg} 


where method mi(ui){Si} £ Mci, A = {pi}v'i.mi(vi){gi},..., {p„}v(j.m„(v7){g„}. 


Fig. 6. Comparison of dynamically typed and statically typed Hoare logic rules 
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Definition 1 (Type-safe partial correctness). {p}5'{g} holds in the sense 
of type-safe partial correctness (written \=Tp if every non-diverging, 

non-failing computation starting in a state satisfying p does not abort with a type 
error, but ends in a state satisfying q. 

The preconditions particular to proof rules for type-safe partial correctness are 
called type safety preconditions. Being orthogonal to preventing divergence (t) 
and failures (/) (calling a method on a null value) as well as ensuring the post¬ 
condition (p), these correctness notions may be freely combined. Total correct¬ 
ness would hence be denoted rtfp. In the proof rules given in Figure 6 (and 
Appendix B), r-preconditions are [marked] . 

In HLs , type-safety preconditions are unnecessary. Regarding such precon¬ 
ditions, correctness proofs in statically-typed languages resemble those in dy¬ 
namically typed languages for type-unsafe correctness notions. Omitting these 
preconditions hence is a first step in proving dynamically typed programs like 
statically-typed ones. This can be achieved by treating type safety issues seper- 
ately from other correctness issues. 


Definition 2 (Decomposition). The following rule is added to the proof sys¬ 
tem for a type-safe notion of correctness (tX): 

bjv {p} >S'{g} where hx refers to the corresponding type-unsafe variant 

of the proof system while hi-p always refers to the proof 
system for type-safe partial correctness. 


{p}S{true} 


{p}S{q} 


Correctness of the decomposition rule follows directly from the semantic 
definition for type-safe partial correctness. Intuitively, it states that whenever 
\=Tp {p}S{true} and the precondition p have been established for some state¬ 
ment S, we can omit type safety preconditions when reasoning about S, although 
our program is dynamically-typed. 


4.2 Mapping Objects to Values 


Mapping predicates are a further peculiarity of HLd . However, when the types 
of all used variables are known, those predicates can be generated automatically. 
We will now provide a “virtual” variable u of the corresponding base type for 
each object variable u that can be safely mapped. 

First, a subset of “pure” (i.e. immutable) classes C C along with a function 
mapping classes from to corresponding base types T G 7^ of the assertion 
language must be defined. For dyn, this mapping is 

T{num) = N, T{bool) = B, T{list) = L, ... 

The mapping can be extended to union types T G 7d by defining 

iF({}) = Null, 'f'({C}) = T{G) for C G and tf'(r) = O otherwise. 





12 


For each type T G Ts, there is usually already a mapping predicate T(o, ?;) : 
O X '-t ® for mapping objects to values as well as a safety predicate safej{o) : 
O I— B defining under what condition this mapping is safe. For N these are® 
N(o,n) = safe^{o) —>■ (o.@pred = null —>■ n = 0 A 

o.@pred ^ null —>■ N(o.@pred, n — 1)) and 

safefq{o) = o null A |o] G {num} 

We then introduce a new assertion language Asrtr allowing the use of au¬ 
tomatically mapped virtual variables x. Its semantics is defined in terms of a 
mapping T : Asrtr Asrt to the old assertion language. 

Definition 3 (Automatic Variable Mapping). 

Let xi,...,x„ be a sequence of variables that can be safely mapped to types 
Ti,...,T„ and for which Xi occur free in p for i G Also, let be a 

corresponding sequence of logical variables of types Ti,...,T„. Then, 

T(p) = : Ti, : T„. Tgip) A Tm(p) 

'^s(p) = p[xi, ...,x„ := Uii, ...,fx„], T’m(p) = Ti(xi,-yxJ A ... A T„(x„,Ux„) 

The precise definition of which variables can be “safely mapped” depends 
on the type information available. For the verifier that will be discussed in Sec¬ 
tion 6.1, the Xi may be local variables u or instance variables of the current object 
this.@x. Note that Asrtr conservatively extends Asrt, as any assertion p G Asrt 
is mapped to itself. We hence assume T to be implicitly applied to all assertions, 
enabling the pervasive use of automatic object mapping. For instance, assuming 
that sa/ej^(u) could be established in the lower layer, the T-assertion u < 5 can 
be used instead of the equivalent 3vu : N. Uu < 5 A N(u,'Cu)- To formally show 
that the automatic object mapping allows us to trivially map stat assertions 
into dyn assertions, we need a mapping 0 between their states. 

Translating States: 0{as) = ad where ad is derived from cr^ by introducing 
for every base type T G 7^\{0, Null} a (possibly infinite) set of objects {o„ | v G 
AT(o^, u)} and substituting every variable x of base type T, holding the value 
V G by a variable x of type O, referencing the object Oy. Furthermore, for 
each base type T G 7^ \ {O, Null], we identify any two objects oi, 02 iff T(oi, fi), 
T(o2, V2) and Vi = V2- We lift this equivalence to dyn states in the natural way. 
Translating Assertions: 0{p) = p[xi,...,x„ := xi,...,x„] where x^ are all 
variables that can be safely mapped and occur free in p. 

Theorem 1. For all assertions p and stat states a: a \= p iff 0{a) ^ 0{p)- 

The automatic mapping requires safety predicates to be pre-established in the 
lower layer, which requires both type information and tracking of null values. 

4.3 Pure Expressions 

HLs allow highly effective reasoning by including (syntactically identified) pure 
program expressions into their logical assertions. In this section, we will show 

® Expressing them using quantification over sequences instead of recursion is possible, 
but less readable. 
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that assuming the availability of type information in the lower layer, this concept 
is also applicable to dynamically-typed languages. 

To define a pure subset of dyn expressions, one complements the set of “pure” 
classes Cg with a set of “pure” (i.e. side-effect-free) methods Ms C M and ex¬ 
tends the function S' to also map method- and constructor calls to corresponding 
logical expressions. Such an expression I S LExp of type T with free variables 
Vq, ..., Vn of types Tq, ..., T„ can be interpreted as a function fi : Tq x ... x T„ i— )■ T. 
We hence denote its type as LExp{To x ... x T„ >->• T). The extension of the map¬ 
ping E can then be written as follows: 

For every pure operation m of arity n: 

E : (To.m(Ti, ...,T„) —)■ T) i—)■ LExpi^Q x ... x T„ i—>■ T) 

For every pure constructor new C of arity n: 

E : (tF(C').init(Ti,..., T„) —>■ T) LExp{Ti x ... x T„ >->• T) 

For the type N these are 

E{N.imt{NuU)) = 0, <F(N.init(N)) = -I- 1, 

<F(N.add(N)) = uq -I- vi, tf'(N.succ()) = uq -|-1. 

It is then possible to define a predicate pure{e) automatically identifying pure 
expressions given type information for all variables used. E can be extended to 
map such pure program expressions to typed logical expressions. We denote the 
type of a pure expression by T(e). Then, after establishing that 

{p[f := !F(To.m(Ti, ...,T„) -)► T)]}uo.m(ui, ...,u„){p} 

with Ti = T(ui) for all i G N„ holds for all methods in Ms, the following 
axiom can be established by induction over the structure of e 

AXIOM: PURE EXPR: {p[f := E{e)]}e{p} where pure{e) 

Combining the axiom with dyn-specific proof rules yields simplified rules for 
pure expressions that closely resemble those for stat. Eor instance: 


AXIOM: PURE ASGN 
{p[x := E{e)]}x := e{p} 
where pure{e), r(e) U t(x). 


RULE: PURE COND 

{p/\E(e)}Si{g} {pA ^iF(e)}5'2{g} 

{p} if e then else S 2 fi {g} 

where pure{e) and r(e) = B. 


Definitions for pure{e), E : Expr^i 1 —LExp and T(e) as well as omitted 
rules and soundness proofs can be found in Appendix A. Finally, we are able to 
state the main theorem of this section: in combination with decomposition and 
automatic object mapping, above rules allow verification just like in statically 
typed languages. This follows from the fact that stat proofs closely resemble 
dyn proofs using these techniques. 

Translating Programs: Since stat C dyn, we simply have 0{S) = S. 
Translating Proofs: 0{(j)) = :/? is defined inductively over the structure of the 
proof (j) in Hoare logic for stat. Applications of the rules ASGN,COND,LOOP 
and METH need to be substituted for applications of PURE ASGN, PURE 
COND, PURE LOOP and PURE METH -h PURE ASGN respectively. Note 
that this is always possible as stat expressions are pure and well-typed pure 
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assignments preserve safety predicates. Applications of all other rules can be 
preserved, as they are identical for dyn and stat. 

Theorem 2. For every stat program S and every correctness proof (j) of a prop¬ 
erty in Hoare logic for a particular correctness notion of stat programs, 

is a valid proof of the property {0{p)}S{0{q)} in Hoare logic for a corre¬ 
sponding type-unsafe correctness notion of dyn programs. 

Furthermore, since types for stat programs can be inferred, their type-safety 
proofs can be constructed automatically (see Section 6.2). Applying the decom¬ 
position rule (Definition 2) then yields a proof for type-safe correctness. It follows 
that for statically typable programs, deriving a proof in dyn (using the layer 
of abstraction) does not require any more effort than deriving it in stat. The 
remainder of this paper will discuss how the layer of abstraction can be applied 
to arbitrary dynamically typed programs by deriving the necessary type infor¬ 
mation. In Section 7, we will demonstrate this point by proving the evaluator 
example correct. 

5 Deriving Type Information 

5.1 Type Information and Type Safety 

A program tt is called type-safe if no execution of tt can result in a type error. 
Type safety is the problem of deciding whether a given program is type-safe. 
Since type errors can be regarded as a form of output, type safety is a nontriv¬ 
ial semantic property and hence undecidable for Turing complete languages by 
Rice’s theorem. 

A type T is an element of a complete lattice (T, E) = (2^,C). A typing ty 
of a program tt is an arbitrary data structure giving rise to a mapping ty{S) : 
Stmt I—>■ T from sub-statements S' of tt to types. It is important to stress that 
a sub-statement occurring multiple times in tt is treated as multiple different 
statements. One can think of statements as represented by their parse tree nodes. 

A typing ty for a program tt is called sound iff in every execution of tt, 
whenever a sub-statement S is evaluated to a value v, then u is of a type T C 
ty{S). A typing ty is at least as precise as another typing ty', written ty C ty', 
iff for all statements S it holds that ty{S) E ty'{S). 

For a program tt, the least precise type-safe typing ty\ is a typing where for 
every method call eo.m(ei,..., e„), ty\{e[)) = {C G C | C supports method m of 
arity n} and for every conditional or while loop with condition e, tj/3(.(e) = {bool} 
and for all other sub-statements S, tyl.{S) = T. By definition, a program tt is 
type-safe iff it has a sound® typing ty that is precise enough to establish type 
safety {ty^ty}). 

Type safety verifiers (type inference algorithms) derive a typing for a given 
program by over-approximating its behavior. A verifier is sound iff the typings 
it derives are. 


if a method call, conditional or while loop is unreachable, sound typings may assign 
the type _L to its receiver / condition. 
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Note that given a typing ty for a program tt, it is straightforward to decide 
ty E ty^. In fact, this would even be possible if ty, like ty\^, would assign T 
to all non-receiver, non-condition subexpressions. However, deciding soundness 
usually requires more information. For this reason, sound type safety verifiers 
usually a) assign types to all subexpressions and b) provide a set of inference 
rules (commonly called a “type system”) allowing to check safety of their derived 
typings using this additional type information. A soundness proof for these rules 
with respect to the semantics of the programming language is a crucial part of 
proving such algorithms sound. 

Some algorithms (e.g. context-sensitive ones) even assign multiple types to 
each statement (one for each context). While ty{S) in this case yields the union 
of all types assigned to S, the soundness proof may differentiate these types. 

As type safety verifiers differ, so do their typings. We associate with each 
verifier Vx a kind of typing capturing its respective format and restrictions. 
Between such kinds. It is possible to translate in both directions. However, as the 
precision achievable with a verifier Vx varies, so does the precision expressible 
using Vx-typings. For instance, while it is usually possible to translate path- 
insensitive typings into path-sensitive ones by assigning the same types to each 
path, the reverse direction entails merging paths and thus a loss of precision. 

5.2 Type Safety Proofs are Type Information 

A type safety proof for a statement S' is a proof of the property {p}S{true} for 
some precondition p in Hoare logic for type-safe partial correctness. When run 
from a state satisfying p, it ensures type-safety of S by establishing all type-safety 
preconditions. 

Such proofs constitute a kind of typing as their assertions contain type infor¬ 
mation that is by definition sufficient to establish type safety. Soundness of these 
typings can be validated using the proof rules of Hoare logic. Before discussing 
how to extract type information from a Hoare logic proof, one should state that 
this information needs to be compatible with the type safety verifier to be useful 
for our purpose. We hence define typing assertions TAsrt C Asrt as a subset of 
the assertion language modeling the capabilities of this verifier. 

For instance, the verifier Vex that will be presented in Section 6.1 is based on 
a flow-sensitive, path-sensitive data flow analysis. As usual, only local variables 
of the current method and instance variables of the current object are tracked 
flow-sensitively. The remainder of the heap is abstracted into a finite number of 
type variables |C.@x] - one for each instance variable @x of each class C. 

Logically, Vex establishes a global typing invariant of the form 

^Exity) = Vo. /\ ||o|e{C}^ /\ |o.@x| e tj/(|C.@x]) 

CgC V OxGVc 

for a V_B 2 ;-typing ty, stating the fact that the types assigned to the type variables 
|C.@x] in ty are over-approximating the actual types of those instance variables. 
Also, automatic verifiers provide for each program location the return type of the 


16 


previously executed expression as well as the types of all variables tracked flow- 
sensitively. Logically, those can be regarded as a conjunction of typing literals 
(see below). Additionally, path sensitivity allows differentiating different paths 
leading to a program location and hence requires expressing alternatives, leading 
us to a disjunctive normal form of typing literals. Hence, only the literals allowed 
in typing assertions are verifier-specific^. For Vex we define 

TAsrt 3 T ::= /r | r V r | t A r, TLUex 3 /i ::= | |u] S T | |this.@x] S T 

We will now define how to extract type information from Hoare logic proofs. 
In such a proof, each postcondition may contain flow-sensitive type information 
about variables as well as the return value r of the previous expression. Given an 
assertion p, one extracts this information by first converting p into disjunctive 
normal form, treating typing literals, equations and quantifiers as literals and 
then applying a projection prx ■ Asrt i—>■ TAsrtx that preserves A,V,p while 

mapping all literals ^ TLitx to true (= [this] S T). Every assertion p thus 
implies prx{p)- Note that depending on the structure of p, there might be a 
significant loss of precision. This is unproblematic, however, as supplying type 
information is in the user’s interest. The following theorems show that sufficiently 
precise type information can always be supplied for type-safe programs. 

Lemma 1. For every assertion p and every Vx-typing assertion r such that 
p ^ T, there exists an equivalent assertion p' 33 p such that prx{p') = t. 

A Vx-typing assertion r is most precise for an assertion p iff p —>■ r and for all 
Vx-typing assertions r', p ^ t' implies r -3 r'. 

Theorem 3. For every verifier Vx, each type safety proof ip has an equivalent 
proof Ip', such that for every assertion p' in ip', prx{p') is most precise for p'. 

Furthermore, one can define a projection pr\ further projecting Vx typing as¬ 
sertions to summary types for the variable x such that for all assertions p, all 
variables x and all verifiers Vx we have p —>■ |x] € pr\{p). For Vex' 

P^lix^M e T) = T, pr\Al^l e T) = T with x' x, 

Prfxhv) = T\pr|^(v) 

P^Exir A r') = pr%{T) □ pr|^(r'), pr'hxiT V t') = pr^^ir) U pr%^{T') 
We extend pr\ to assertions by defining pr\ = pr\ oprx- Using it, every 
type safety proof ip gives rise to a Vx-typing ty^ assigning every sub-statement 
S the type prxiqi A ... A qk) where are the postconditions of all Hoare triples 
of the form {pi}S{qi} in ip. 

Theorem 4 (Completeness relative to Hoare logic). Given completeness 
of the Hoare logic, for every type-safe program tt there exists a type safety proof 
Ip such that ty.,p is sound and precise enough to establish type safety: ty.,^ C ty\. 

It follows that no (sound) typing can be more precise than a type safety 
proof. It is hence possible to translate from all other kinds of typings into them 
without incurring any loss of precision. 

^ Adding the literals u = null and this.@x = null allows tracking null values. 
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6 Semi-Automation 

As type safety is undecidable and full automation hence only achievable at the 
expense of completeness, we instead aim at semi-automation by integrating a 
suitable automatic type safety verifier into Hoare logic. Such verifiers have to 
satisfy the following requirements: 

— Soundness - it safely over-approximates the actual program behavior. 

— Monotonicity - increasing precision cannot create type errors. 

— Refinements - provides an interface to supply trusted assumptions for in¬ 
creasing precision (Section 6.1). These must be treated 

• Flow sensitively - assumptions should have an associated program loca¬ 
tion and affect only data flows from that location onward. 

• Path sensitively - assumptions should be able to use disjunctions (V) 
to express alternatives. The verifier should treat these alternatives like 
different paths reaching the associated program location. 

— Termination - terminates on all inputs (programs).® 

Note that flow- and path sensitivity are required only for refinements®, not for the 
verifier itself. Also, the chosen Hoare logic must be powerful enough to express 
the verifier’s reasoning. While we consider these requirements to be modest and 
can hardly imagine an analysis not amendable using this approach, proving this is 
difficult. In this paper we will therefore concentrate on automatic verifiers based 
on flow-analysis which are known to contain quite powerful analyses [18,32,28]. 

6.1 An Examplary Automatic Type Safety Verifier 

In this section we will introduce an exemplary automatic type safety verifier V_ex 
to in the following complement our abstract discussion with concrete examples 
using Vbx- typings. 

To also shed some light on the minimum requirements above, we chose a 
minimalistic one exactly fulfilling these criteria. Vex is based on a sound, flow- 
sensitive data flow analysis and resembles the work of Palsberg et al. [22]. As 
required, we allow specifying a set of path-sensitive trusted assumptions. How¬ 
ever, the analysis does not introduce path-sensitivity by itself. 

Flow Sensitivity Intra-procedurally, local variables as well as instance variables 
of the current object are tracked fiow-sensitively. As usual, this is realized by 
converting all statements to static single-assignment form (SSA) with respect 
to these variables prior to analysis. A V£;a;-typing ty hence assigns one type 
t 2 /(|x], L) per program location L to each such variable x. 

Path Sensitivity To realize intra-procedural path-sensitivity, for each path 
j G path{S^) from the start of a method to each program location in the 

® Potentially non-terminating analyses like [16] must be performed iteratively by first 
generating a base result and then refining it towards higher precision (similar to [28] ): 
thus, they can be interrupted any time and yield the most precise result reached. 

® Flow- or path-insensitive assumptions would increase the annotation-burden. 
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method, the previous sub-statement S and each fLow-sensitively tracked variable 
X are assigned distinct types and < 2 /(|x], 5'!, j) respectively. Program 

locations are denoted by (or S^) for the beginning (end) of a sub-statement 
S of TT. 

Null Pointers Although [22] is a pure type analysis, we use the same algorithm 
to also perform null-pointer analysis. This is realized by defining the value null 
as the only instance of a class Null and furthermore explicitly inserting the class 
Null into all types whose expression may evaluate to null instead of implicitly 
allowing null to be element of every type’s domain. We hence define 

|x] G {Null, Cl,..., C„} = X = null V |x] G {Ci,..., C„} 

The interested reader may find a more detailed account of algorithm Vex in 
Appendix C. 
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Fig. 7. Overview of the logics involved and the mappings between them 


Typings to Logic The function Sx maps the type information contained in 
a flow-sensitive, path-sensitive Vjf-typing ty for each program location L into a 
typing assertion. Ve^-typings ty are flow-sensitive, as ti/(|x],L), the type of the 
variable x at program location L takes strong updates into account. Hence, for 
Vsaj-typings, the function Sex can be defined as: 

E:Ex{ty,S^) = ViGpatft(SD (w e ty{lSj,j) A W e tj/(|x],S''^, j)) 

where S' is a sub-statement of tt, M{L) denotes the method a program loca¬ 
tion L belongs to and Vc.m = 'var^Sc.m) U change{Sc.m)- 

Definition 4 (Refinement of Typings). Let ty he a VEx-tyP'ing derived for a 
program tt. Then a conjunctive refinement step of ty using the trusted assump- 

T L 

tion T at program location L is a quadruple {ty,T,L,ty'), written ty ^ex ty' 
with the VEx-typing ty' being derived for a program tt' resulting from tt by in¬ 
serting the Statement TZr just before L and TZr being defined inductively as 
7^WGr = a;:=a;nrio 

= if" thenTZjyclse 7 ?.^' end}^ 
the type filter x fl T is defined in Appendix C 

the condition does not matter as Vex will treat conditionals non-deterministically 
anyway. 
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Theorem 5. For all conjunctive refinements ty ^ex ty', ty' C ty holds. 

6.2 Translating Typings into Proofs 

Definition 5 (Typing Proof). A typing proof for a Vx-typing ty of a state¬ 
ment S is a minimaf^ proof of the property {p}S{true} for some precondition p 
in Hoare logic for partial correctness that for every sub-expression e of S contains 
a Hoare triple {Pef^ide} with pr'^^^q,,) Q ty{e). 

Technically, only establishes soundness of the typing ty (by being a Hoare logic 
proof and ty^jj C ty). However, when ty.^ C ty^, can be turned into a type 
safety proof by changing the proof system to Hoare logic for type-safe partial 
correctness and trivially establishing the type safety preconditions. Hence, typing 
proofs are well-suited as intermediate steps towards type safety proofs. 

Recall that Vx-typings can be checked for soundness using a Vx-specific in¬ 
ference system. It is hence possible to extend Sx to mechanically derive a typing 
proof Ip = Sx(t^, ty) for a sound Vx-typing ty for a program tt by translating 
the rules of this inference system into Hoare logic and establishing Xx{ty) as 
a global invariant. In such a proof, each assertion p at program location L is 
exactly the typing assertion Eix{ty, L). We hence write -x{i>,L) = ^x{ty,L). 

The interested reader may find an examplary translation for V_Ex-typings in 
Appendix D. Note that Hx allows using automatically derived type informa¬ 
tion in Hoare logic proofs even in theorem proving environments trusting only 
propositions that they verified a proof for. 


6.3 Two-Layered Proofs 

A two-layered proof is a Hoare logic proof for a type-safe notion of correctness 
of a dynamically-typed program, in which every assertion has the form t Ap for 
a typing assertion t and an assertion p. While p is user-editable, r is meant to 
be created and modified solely by an automated type safety verifier. We refer to 
T as the “lower layer” and p as the “higher layer” of the proof/assertion. 

Theorem 6 (Two-Layered Proof Construction). Given a typing proof (pi 
and a proof (ph for the same program tt, it is always possible to construct a 
two-layered-proof (p with (pi as lower and ph o,s higher layer. 

Starting from a typing proof S'x(7r, ty) in the lower layer and only true in the 
higher layer, proofs in the higher layer are supported by type information from 
the lower layer (Section 4). The type information may also be refined: 

Definition 6 (Refinement of Typing Proofs). Let tp = Eix{x, ty) be a typ¬ 
ing proof generated by a typing ty of a program tt. Then each conjunctive refine¬ 
ment step ty ^ ty' gives rise to a conjunctive proof refinement step ip ^ ip' 
with Tp' = Eix{^T^ ty'). 
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all Hoare triples in Tp must contribute to establishing the conclusion. 
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Let 'i/'i = ty) be the lower layer of a two-layered proof ^|J. Then whenever 

a typing literal appears within the higher layer p of an assertion at program 
location L in '0, the lower-layer proof 0; is substituted by the result tp'i of the 

conjunctive proof refinement step tjji ^ ’’Pi- In such refinements L') 

—> Sx{ipi,L') holds for all L' due to Theorem 5. Higher layer proof steps 
depending on lower layer information hence remain valid. 

7 Verifying the Evaluator Example 

To demonstrate how the techniques developed enable the convenient verification 
of dynamically typed programs despite hard typing problems, we will proof the 
evaluator example both type-safe and correct. Figure 8 shows all annotations^^ 
necessary to prove that calcO derives a given term’s value. 

Type safety: The given invariant enables deriving the assertions on lines 2-4 
and hence a proper typing of the remaining program. As a property of the ad-hoc 
data structure it must be established in the (omitted) method parse(). With 
the types of env (S >->■ N) and r (N) known, their mapping can be automated. 

The complex ad-hoc data structure tree is given the (imprecise) type N i—>■ O 
and its elements hence need manual mapping. These mappings are encapsulated 
in predicates {valuetree2, vartreeS, optreeS) and furthermore ignored. 

Correctness (marked): The lower layer information allows identifying numer¬ 
ous pure expressions (tree[l], env [tree [1] ], tree[l] == ADD, etc.). Estab¬ 
lishing the specified property for the first two branches then only requires apply¬ 
ing PURE EXPR. The conditional in line 5 can be handled by PURE COND. 
Since all arguments to the recursive method calls in that line are also pure, 

{ optreeS (tree, env) Atree[l] = ADD} calc (env, tree [2]) + calc (env, tree [3]) 
{treevalS'(tree, env, r)} can be derived automatically. Note that all implications 
can be handled by SMT solvers with theories for Presburger arithmetic and lists 
for which effective decision procedures are known and do not require reasoning 
about graph-like object structures . 

8 Related Work 

There are several threads of related work regarding dynamically typed programs. 

In each, we can only discuss those works most closely related to ours. 

Type Safety: Cartwright [6] pioneered a strand of work called “soft typing”, 
applying automated type safety verifiers to dynamically typed languages with 
the aim of improving performance. Another line of work is “gradual typing” 
[26,4], letting the user decide which parts of the program should be statically 
checked for type errors, while dynamically typing the remaining program. 

Again, all recursive predicates can instead be expressed using quantification over 
seqnences, at the expense of readability 
The two may also be combined [24] 
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def valuetree2{t,n) = N(l[0], VALUE) A N(l[l], n), valuetreel (t) = 3n. valuetree2{t,n) 
def vartree3{t,e,x) = N(i[0], VAR) A *) A e[a;] null 
def vartree2{t, e) = 3x. vartree3{t,e,x) 

def optree5{t,e, op,l,r) = N(f[0],OP) A N(l[l],op) A h(t[2],l) A parsetree2{l,e) A 
L(f[3], r) A parsetree2{r, e) 

def optree2(t, e) = Bop, Z, r. optree5{t, e, op, I, r) 
def parsetree2{t, e) = valuetreel (t) V vartree2{t, e) V optree2{t, e) 
def treeval3{tree, env,n) ^ 
valuetree2 (tree ,n) V 
3x. vartree3{tree, env,x) A env[x] = n V 

3op,l,r,ni,nr. optree5{tree, env, op,l,r) A treeval3{l, env,ni) A 

treeval3{r, env, n^) A 

op = ADD —>■ n = n; + rir A ... 
inv Vi, e. parsetree2{t, e) —>■ 

i[0] = VALUE —>• valuetreel (i) A i[0] = VAR —^ vartree2{t, e) A i[0] = OP —> 
optree2{t, e) 

{parsetree2 (tree, env)} 

1 method calc(env, tree = @tree) { 

2 if tree [0] = VALUE then {valuetreel {tree)} tree[l] 

3 elseif tree [0] = VAR then {tariree.8^ee, env)} env [tree [1]] 

4 elseif tree [0] = OP then {opiree.8(tree, env)} 

5 if tree [1] = ADD then calcCenv, tree [2] ) + calcCenv, tree[3]) 

6 elseif ... 

7 else nil 

8 fi 

9 else nil 

10 fi 

11 } _ 

{ireetaZZ? (tree, env,?)} 


Fig. 8. Correctness proof for the evaluator example 
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Both soft- and gradual typing require rewriting of program parts in exchange 
for type safety guarantees. On the contrary, our approach is able to provide such 
guarantees also for parts that are not statically typable. Also, the user is free to 
omit type safety from the specification (dynamic typing) and may still rewrite the 
program to allow automatic checking (static typing), both on a per-expression 
basis (gradual). With respect to correctness, our approach hence subsumes both 
soft and gradual typing. However, it does not (yet) increase performance. 

Others [13,20], have extended such abstraction-based verifiers to handle many 
ideoms common in dynamically typed languages. 

Correctness: To the best of our knowlegdge, [12] currently is the only^® ax¬ 
iomatic semantics for a type-safe notion of correctness of a dynamically typed 
language. Like discussed in Section 2.1 it uses type safety preconditions, con¬ 
siders all variables to be of object type and does not use pure expressions and 
would thus benefit from our approach. 

Nguyen et al. [21] proposed an automatic contract verifier for untyped higher- 
order functional languages based on symbolic execution inserting run-time checks 
for contracts it cannot statically guarantee. Since they use a mechanism similar 
to widening to enforce termination, their apporach also combines abstraction- 
based and symbolic reasoning. 

Drawing on their work on the verification of untyped higher-order functional 
programs [9], Chugh et al. [8,7] provide a dependent-type system for an untyped 
functional “core calculus” Xjs JavaScript programs can be translated into. No 
soundness is demonstrated for their system. 

Swamy et al. [29] semi-automatically reason about a wide range of JavaScript 
ideoms by translating into the dependently-typed functional language F* and 
using its SMT-based reasoning engine. They also noticed that the type infor¬ 
mation generated by an abstraction-based type safety verifier (GateKeeper in 
their case) are useful to improve the effectiveness of automatic reasoning en¬ 
gines. However, they did not feed the symbolically derived proof results back 
into GateKeeper and did not use the type information to ease the annotation 
burden for their users. Since their main focus lies on a novel encoding of Dijk- 
stra’s predicate transformer semantics allowing F*^s dependent type inference 
to effectively reason about imperative programs in a style similar to Hoare logic, 
we consider the approaches to be largely complementary. 

In general, all fully automatic approaches [6,21,13,9] are necessarily incom¬ 
plete. They can however be used as automatic type safety verifiers. Furthermore, 
all purely symbolic approaches [9,8,29,12,23] require all type information to be 
manually specified in method contracts and loop invariants. 

Both the idea and the term “Layer of abstraction” are inspired by the work of 
Gardner et al. [12] on reasoning about JavaScript. However, their work abstracts 
from the peculiarities of the JavaScript variable store, while ours abstracts from 
the complexity of dynamic typing and is applicable to virtually any dynamically 
typed language. 
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[23] only treat partial correctness. Also, they restrict the programming language to 
allow a form of (type-unsafe) pure expressions 
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The decomposition rule used to establish the layer of abstraction is inspired 
by similar constructions in [1]. 

Some tools for verification of statically typed imperative programs [10] allow 
using a “pure” subset of the programming language (that is side-effect-free and 
guaranteed to terminate) within assertions. The ability of our layer of abstraction 
to allow the use of well-typed “pure” program expressions in assertions can be 
seen as an extension of this idea to dynamically typed programs. 

Combining Static Analysis with Program Logics: There has been a consid¬ 
erable amount of work on integrating algorithmic decision procedures (mostly 
model checking) and deductive methods for program verification (See [31] for 
pointers). Due to the deep connection between data flow analysis and model 
checking [25], many of these techniques can be considered as related. 

Note that our conjunctive refinement differs from abstraction refinement since 
it is not the abstraction that is refined, but the analysis result. 

Also, translations from typings (f.i. from type systems for informat ion-flow 
properties) to program logics are commonly used in the Proof-Carrying Code 
Community [14] to avoid the need for property-specific proof-checkers. Although 
PCC is a completely different application area, their aim was also to integrate 
results derived by different inference systems into one common representation - 
and incidently they also chose a program logic as their “lingua franca”. 

A closely related proposal also integrating symbolic with abstraction-based 
reasoning is MIXY[19], a framework for mixing symbolic execution with type 
checking. In their system, the user divides his/her program into s-blocks and 
t-blocks. While s-blocks are analysed using symbolic execution, type analysis is 
applied to t-blocks. The results of both analyses are bidirectionally exchanged 
using so-called MIX-rules: Type analysis results are translated into a matching 
start environment for symbolic execution and types ensured by (exhaustive!) 
symbolic execution can be used for type analysis. Also, the aim is related: What 
Phang et. al called “balancing precision vs. efficiency” is the same as “combining 
automation with completeness”, although Phang et al. do not proof their system 
complete. Our approach could most likely be integrated into their framework as 
“Hoare-Logic blocks” {h e h} with typing F \- {h e h} ■ t foi which a Hoare- 

triple {pe}e{qe} must be derived where Pe = SxiF) and pr'^^{qe) = t for some 
verifier Vx- 

9 Conclusion &; Future Work 

The approach presented allows verifying dynamically typed programs just like 
statically typed ones, requiring manual assistance only on hard typing problems. 
If a program is statically-typable, there is no difference. Otherwise, the user may 
freely choose whether going beyond the limits of the type system is worth the 
verification effort. While the stated requirements for automated verifiers allow 
conveniently using the technique, more powerful verifiers can be expected to 
significantly increase the degree of automation. Being gradually applicable like 
gradual typing and automated like soft typing, the approach allows deriving type 
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safety guarantees also for code parts that cannot be startically typed. Should it 
turn out to be practically usable, it would suggest dynamic typing as a serious 
alternative to static typing for verifiable languages that have all necessary infras¬ 
tructure readily available. Until then, there are several useful ways to extended 
this work: 

Completeness: Currently, there is no (relative) complete Hoare logic for a dy¬ 
namically typed language. While the approach is also applicable to incomplete 
program logics, no completeness guarantee can be provided in this case. 

Other Program Logics: [8,29] are both based on refinement types. Recently, 
it has been shown [30] how to extend such systems to provide (relative) com¬ 
pleteness like Hoare logic. It would be interesting to investigate if our approach 
is also applicable to such program logics. 

Other Program Analyses: The formalization of semi-automation suggests 
that it could be generalizable to arbitrary data flow analyses. 

Performance: The derived type information could be used to omit run-time 
checks and generate more efficient binaries. 

Features: The current concept excludes optional variables and type errors as 
exceptions. Also, closures and advanced dynamic features like method update, 
dynamic type hierarchies and eval() should be studied. 

Implementation: An implementation would allow evaluating the practical use¬ 
fulness of the concept. 

Acknowledgements We thank Sven Linker, Martin Hilscher and Eike Best for 
insightful discussions and useful comments on drafts of this paper. 
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A Appendix: Identifying and Translating Pure 
Expressions 

Identifying pure expressions {pure : Ts x Exprd >—>■ B) 
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pure{e) = T(e) defined, T(e) = m/({T | pMre(T, e)}) 

pure{NuU,mil\) = true, Ve € Expr^. pure{NuU, e) pure{ 0 ,e) 

pure{T,u) = |u] G T A E{T) = T A safej{u) (includes the case of this) 
pure{T, @x) = |this.@x] G T A E(T) = T A safej(@x) 
pure(]B, e is_a? C) = pure{ 0 , e) 

pure{M>, Cl == 62) = 3 T G Ts- pure{T, ei) A pure{T, 62) 
pitre(T, eo.m(ei,...,e„)) = 

3 To, pure{Ti, d) for f G N„ A 'f'(To.ni(Ti,T„) —>• T) defined 

pureiT, new C(ei,e„)) = 

3 Ti,...,T„. pMre(Tj,ei) for i G A il'(tP'({C}).init(Ti,T„) T) 

defined 

pureiT, u := e) = false 
pure{T, @v := e) = false 
pMre(T, while e do 5 ” od) = false 

pure{T,ii e then else S2 h) = pure{M,e) A Si = A pure(T,ei) for i G 

{ 1 , 2 } 

Translation of pure expressions into logical expressions : 7 ^ x Expr^^ 1—)■ LExp) 
E{e) = 'Er(e)ie), 'Ejfx) = x 
S'jv«H(null) = null, ( 7 /jv«ii(e) = p) (%(e) = p) 

l^'T(eo-m(ei, ...,e„)) = l[vo, ...,Vn := Ej^ieo), ...,Ej^{en)] 
where pure(Ti, a) for i G N„ and lf'(To.ni(Ti,..., T„) —)• T) = 1 . 

%(new C(ei, ...,e„)) = l[vi, ...,Vn := tf'Ti(ei),..., S't,,( e„)] 

where pure{Ti, Ci) for i G and 'f'('f'({C'}).init(Ti,..., T„) —>• T) = Z. 

^B(ei == 62) = 'Z'T(ei) = ’Z'T(e2) where pure{T,ei) for i G { 1 , 2 }. 

Enie is_a? C) = [iZ^o(e)l G {C} 

e then ci else 62 h) = if 'Es{e) then S'T(ei) else ^'1(62) h 


Proof Rules for pure expressions 


RULE: PURE LOOP (I strong 1 [ ^type-safe] partial correctness) 
{pAE{e)}S{p} 


{p} while e do S' od {p A -^E{e) A r = null} 
RULE: PURE METH 

{p}uo.m(ui,...,u„){g} 


where pure{e), T(e) = . 


{p[uo,..., u„ := >f'(eo),..., >Z/(e„)]}eo.m(ei,..., e„){g} 


where Ui G Vl fresh and pure{ei) for all f G N„. 







27 


Soundness 

Proof. The Axiom PURE EXPR can be established by induction over the struc¬ 
ture of the pure expression e, using the guarantees provided by pure{e). In the 
cases for variables, pure(x) implies safej(i<.) for some type T. In the case for 
method calls, we assume 

{p[r ;= !f'(To.m(Ti, -)► T]}uo.m(ui, ...,u„){p} 

with Tj = T(ui) for all i G N„ to be established for all methods in (which is 
precisely the meaning of “correspondence between methods and operations with 
respect to the mapping in Section 4 . 3 ). 

The rules PURE ASGN, PURE COND, PURE LOOP and PURE METH 
can then be derived by combining the axiom PURE EXPR with the dyn rules 
for ASGN, GOND, LOOP and METH respectively. □ 

B Appendix: Axiomatic Semantics for dyn 

AXIOM: GONST 


AXIOM: VAR 


{p[r := nMZZ]}null{p} 


{p[r := u]}u{p} 

Note: includes the case of u = this. 

AXIOM: IVAR 


{p[y := this.@v]}@v{p} 

RULE: ASGN (both normal and instance variables) 

{p}e{q[a := r]} 


RULE: SEQ 


r , r , ■ where u e V 
{p}u := e{q\ 

{p} 5 'i{r} WS’ajg} 
{p}5'i;S'2{(?} 


RULE: Conditional (COND) ([ strong) type-safe j partial correctness) 


{p}e{r A booLtest} 
{rAb}Si{q} 

{r A -^b}S2{q} 


{p} if e then else S2 h {9} 
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A B(r, b) 

RULE: LOOP (I strong 1 [ ^type-safe] partial correctness) 

{p}e{p A r A booLtest} 

{p A r A 6}S'{p} 

{p} while e do ^ od {p A ^6 A r = null} 

where 5 is a predicate and hooljtest is defined as in COND. 

RULE: CONS 


where 6 is a predicate and booLtest = ' r ^ null 1a |r] € {bool} 

^___ ✓ v__ 


Pi,{Pi}S{qi},qi -» q 
{p}S{q} 


RULE: PASGN 


{p[lt := t']}!? := "?{p} 

where {!?} C Vl and {^} C U {null}. 

RULE: BLCK 


{pjl? u := tnull;S'{g} 

IpI begin local := ^; S' endj^} 


when (Vl \ (rj) H free{q) = 0 


where {"u^} C Vl, { t } ^ Vl U {null}, { u } = Vl \ ({d^} U Vg) and |null| = 
RULE: METH 


{pz}ei{pi+i[ui := r]} for i G N„ 
{p„+i}uo.in(ui,..., u„){q} 

{po}eo.m(ei,...,e„){(7} 

where Ui fresh, u^ G VL,Ui ^ var(ej) U change{ej) for all i,j G 
RULE: REC ([ strong 1 [^typesafej partial correctness) 



^ M'S”}?}, 

A h {pi}begin local this, u} := v', v}; Si end{gj}, J G N)j 


Pi ([ v' ^ null IA [v' ^ null v'.@c = pc-1), i G N} 


{p}S{q} 


where method mi(ui){Si} G Mci andA= {pi}v}.mi(vi){ 9 i},..., {p„}v'„.m„(v;;){g„}. 
RULE: CNSTR 

{p}newc.mit('^){q} 

{p}new CC^){g} 

AXIOM: NEW 


{p[r := newc]}newc{p} 






















B.l Auxiliary Rules 

RULE: DISJ 
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M‘5'{g} W5'{g} 

{p\/r}S{q} 

RULE: CONJ 

{Pi}<S'{gi} {P2}>5'{g2} 

{pi Ap2}5'{gi A 92 } 

RULE: 3-INT 

{p}S{q} 

{3x.p}S{q} 

where x ^ var{M) U var{S) U free{q). 

RULE: INV 

{p A r}S{p A q} 

where free{p) H {change{M) U change{S)) = 0 and p does not contain quantifi¬ 
cation over objects. 

RULE: SUBST 

{p}S{q} 

{p[-t ■.= t]}S{q[-t := t]} 

where var(^) fi (var^M.) yjvar{S)) = var{~t ) fi {change{M.) U change{S)) = 0. 

C Appendix: Automatic Type Safety Verifier Vex 

To allow for intra-procedural flow-sensitivity, all statements S are converted to 
static single-assignment form (SSA) for local- as well as instance variables of the 
current object. This necessitates that each occurrence of such a variable x having 
some number of assignments, say n, is replaced by one of its fc > n “versions” 
xi,...,x/j such that each version has exactly one assignment dominating all its 
occurrences (except (/(-occurrences^®). We maintain a mapping r;(x, L) from vari¬ 
ables X S V and program locations L of tt to the version x^ whose assignment 
dominates this location. 

Next, each sub-statement S of the program tt is given a type variable [S']. For 
each method C.m, additional type variables for each of its parameters, 

I^C.m] value and for each version Ui of each local variable 

u are added. Instance variables @v S Vc are given both a global type variable 
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for k — n versions, so-called (/(-assignments are inserted at control-flow joins. The 
occurrences in such assignments need not be dominated. 
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|Vc.@vl and type variables J for each version @Vi in methods C.m using 

them. 

The constraint generation then proceeds as follows: for every method declara¬ 
tion method m(ui,u„){5'c.m} of class C we generate the following constraints: 

E [u.l, [5c.n.l E [Ec.®vl E [ECS;,o1 

for all i G for all @v used in C.m 

Additionally, we traverse the parse tree of the body S'c.m of the method C.m 
applying the rules given in Figure 9. 


In the body of a method C.m: 

null => Inullj = {} u* := e le] C |ui| A |ui := e] = |e]l 

Ui fuil = [FC’"] @vi := e => lej O |[@Vil C IVb.avI A |[@Vi := e] = {ej 

@vi => |@vil = e.m(ei,..., e„) ^ |[el C [m(|[eil,..., ^ IIe'm(--)I] 

this [[this] = {C} S'; e =» |[ 5 ; el = |[e| 

if e then Si else S2 fi => |[e| C {600/} A JSi| U |[S2| E pf hi 

while e do S od Je] E {bool} A Jwhile ... od] = {} 

new C'(ei, ...,e„) => {C} E [init(|[eil,..., |[e„l) Inew C'(...)l] 

...,e„) => leil U ... U |[e„| E |[0(ei,..., en)l 
e n T |[el n T E le n Tl 

Fig. 9. Vex typing rules for dyn 


The resulting system of set inclusion constraints can be solved by propagating 
the type information from constructor calls forwards along the data flow and 
whenever a type C reaches a type variable of the form [m(Ti,..., r„) —)■ i?] 
(generated by a method call), the method C.m of arity n is looked up and the 
following connection constraints are added 

|r,l E for all * G E fR] 

Upon reching a fixpoint, the analysis provides a solution ty (a typing) map¬ 
ping each type variable (and thus every sub-statement, variable, parameter and 
return value) to a union type in R. To mask the initial conversion to SSA we 
define ty{x) = [J- ty{xi) for all variables x tracked flow-sensitively. 

A typing ty is called consistent iff for every constraint [S'!] E |5'2l it holds 
that t?/(|S'i]) E ^?/([5'2l)- An important property of typing rules is for consis¬ 
tency to imply soundness. Note that the constraints marked in Figure 9 serve 
to ensure sufficient precision {ty C tyt) rather than soundness. When omitting 
these constraints, the algorithm outputs a sound typing even when its precision 
is insufficient to establish type safety. 

By intra-procedural path-sensitivity we mean that the algorithm maintains a 
set of alternatives path{L) for each program location L and for each j G path{L) 
separately derives the types of flow-sensitively tracked variables ty{v{x, L),j) as 
well as results of the previous sub-statement tydS”], j) with L = S^. In Vex, 
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path sensitivity is only introduced by specially marked ifps-conditionals dupli¬ 
cating the number of paths leading to them. Note that eliminating equivalent 
alternatives is important to keep the analysis terminating in the presence of 
loops^^. However, as these markings are only used in the definition of trusted 
assumptions below, usually \path[L)\ — 1 for all L. 

Logic to Typings As listed in the requirements in Section 2.4, verifiers Vx 
need an interface for supplying trusted assumptions. Abstractly, one defines a 

r L 

refinement relation ty -b ty' between Vjc-typings. Here, ty' refines ty by taking 
the trusted assumption r at program location L into account. For Vex we first 
extend dyn expressions by introducing a type filter operation e ::= Tfl e gener¬ 
ating the (monotone) typing constraint Tfl |e] C |tne]. By inserting type filters, 
it is possible to refine a typing assertion r = S{ty,L) for a program location L 
to r A t' for some asumption r': 

Definition 7 (Refinement of Constraint Systems). Let G be a constraint 
system generated by applying Vex’s constraint generation to a program tt. A con¬ 
junctive refinement step of G using the trusted assumption r at program location 
L is a quadruple {G,t, L,G'), written G ex G' with G' being the constraint 
system generated by applying Vex’s constraint generation to the program tt with 
the marked conditional Sta{t) inserted just before program location L. For the 
definition of S ta(t) , we assume without loss of generality r = ^"1 V ... V Vn to be 
in disjunctive normal form with all conjunctions Vi mentioning each variable in 
at most one typing literal: 

Sta{v\J t) = ifps ... then Sta{v) else Sta{t) h, 

.St^([x1 gTAi.) = 5ta([x1 5ta(Ix 1 GT)=x:=Tnx 

Note that the condition does not matter as Vex regards conditionals as nonde- 
terministic choice. In essence, if r has n disjuncts Vj then all paths reaching the 
marked conditional are split into n paths and in each of them, the types [x^] 
of all variables x^ G freefuj) are refined to [x^J n pr^.(vj) for program locations 
dominated by L. 

D Appendix: Translation from Typings to Typing Proofs 

We will now show how to translate a given Vp^-typing ty into a typing proof 
with ty^ = ty. Note that in contrast to type safety proofs this is possible for 
every sound typing ty. 

Recall from Section 6.2 that for a Vpx-typing ty, the global typing invariant 
d^Exity) states that the types fy(|C.@x]) assigned to instance variables C.@x in 
ty safely over-approximate the actual types of these variables for all runs of the 
program. Establishing Ip 3 ;(t 2 /) as an invariant of all method bodies and the main 
statement is thus an important step in constructing a typing proof. Fortunately, 
d^Ex{ty) can be shown to be invariant under most proof rules in our logic. The 

|[x| G {Cl} V |[x| G {C2} is not equivalent to |[x| G {Ci,C2}! Otherwise, no assump¬ 
tion could type if b then x := “foo" else x := 21 end; x -|- x. 
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only complicated cases are the rules for assignment to instance variables and 
object creation. For these cases, we introduce new rules that explicitly preserve 
the global typing invariant: 

RULE: 0-IASGN 


{lExjty) A T}e{lExity) A r'lthisMx := r]} 
{lEx{ty) A t}@x := e{lEx{ty) A t'} 


with @x G Vi 


where r ^ Ithisj G {C}, r' ^ |r] e T, T e 2^, T C t2/(|Vb.@xl)- 
AXIOM: 6»-NEW 


{XEx{ty) A T[r := newc]}newc{l£;x(t2/) A r} 

Formally deriving soundness of these rules requires intricate details of the 
substitutions involved and rather lengthy proofs. Intuitively, assignment to in¬ 
stance variables preserves the global typing invariant if the assignment is com¬ 
patible with the typing (T C tyilVc.@xl)) and object creation does so because 
it initializes all instance variables to null and thus the newly created object sat¬ 
isfies the global typing invariant. For a detailed treatment of the substitutions, 
the interested reader is refered to [5,1]. 

The function El defined below will construct typing proofs from proof steps 

like — -(RULE) 

A 

where A is a conclusion of the form A h {pjS'lg}, 
RULE is the name of the proof rule applied, and the (f>i for i G are subproofs 
establishing the premises. For reasoning about recursive method calls, the REG 
rule needs a set of assumptions about the methods in tt. 

Definition 8. For a dyn program tt with classes C each with a set of methods 
Aic OLud a typing ty for tt, the set Att,*?/ of method call assumptions is 

= {{pc.m}vo-ni(vi,...,v„){gc.m} I 3G G C,m G Ate} 

where Pc.m = Aty) A [vqI G {C} A G t?/([-Pc.ml) A*^=iIvo-@v*] G 

tyiVcMvi) and gc.m = A |r] G t2/(|Rc.ml), method m is of arity n and G 
has k instance variables. 

We are now ready to state the definition of E: 

Definition 9. Translation E for Programs 

Given a dyn program tt with a main statement S and a set of methods At and 
a V Ex-typing ty for tt, the function E{TT,ty) yields a typing proof for ty. It is 
defined as follows: 

X{ATj^ty b {true}S{true}, ty) 

E{tt, ty) = X{ATr,ty b 5c.m, ty) for G.m(lI’){S'c.m} G At 

- (REG) 

b {true}S{true} 

with bc.m = {pc.mibegin local this, := end{qQ ^} where Pc.m and 

9c.m are given in Definition 8. 
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Definition 10. Translation S for Statements and Expressions 

For a given Vex- typing ty of a dyn statement S and a Hoare logic statement X 
of the form 


X = A^{I{ty)AEity,^S)}S{Iity)AE{ty,S^)} 

with A a set of assumptions, E{X, ty) yields a typing proof for ty with precon¬ 
dition S{ty,'tS) and under the assumptions A. 

S is defined inductively over the structure of S and in essence models the 
reasoning of the verifier Vex o,s an equivalent combination of Hoare logic rule 
applications. The Hoare triples of above form are then assembled into a full 
typing proof. 


if S = null, .^{ty,'^S) = :^{ty,S'^)[r := null] then 

“ ty) = A A {I{ty) A E:{ty,^S)}nu\l{I{ty) A H{ty, 
if S = u, Ei{ty,'^S) = Ei{ty,S^)[r := u] then 

-{X, ty) = A A {I{ty) A E:{ty, ^S)}u{I{ty) A S{ty,S^)} 


(CONSTf^ 


(VAR) 


if S = @x, ^{ty,^S) = Jiz {ty, S^)[this.@x := r] then 


-{X, ty) = 


----—----—------ (IVAR) 

A I- {I(ty) A^(ty, ■>'S)}@x{I(ty) A ^ {ty, S’'!)} 


if S = \i:=e, ^{ty,^S) ^{ty,^e), ^{ty,e^) -A- ^{ty,S^)[a := r] then 


- {X, ty) = 


A A {I{ty) A S{ty, ^e)}e{I{ty) A S{ty, e'^)} 

A I- {I{ty) A S{ty,^S)}e{I{ty) A H{ty, := r]} 
A A {I{ty) A S{ty,^S)}u := e{I{ty) A S{ty,S^)} 


(CONS) 

(ASGN) 


if S = @x := e, .^{ty,^S) -A ^{ty,^e), ^{ty,e^) -A ^ {ty, S^)[this.@x := r] 
then 


-{X, ty) = 

_ A h {I{ty) A ^ {ty, ^e)}e{I{ty) A^{ty, e^)} 

A A {T{ty) A H{ty, ^S)}e{T{ty) A H{ty, S^)[this.@x := 
A A {I{ty) A - {ty, '^S')}@x := e{T{ty) A S{ty, 5''^)} 


r]} 


(CONS) 

(9-IASGN) 


if S = S = ife then5'ielseiS'2 end, .r,{ty,e^) -A |r] G {bool}, .r,{ty,^S) -A 
S{ty,^e), S{ty,e^) -A E:{ty,^Si), S{ty,e^) -A S{ty,^S 2 ), E:{ty,Sf) -A 
S{ty,S^), S{ty,S^) -A H{ty,S^) then 


^{X, ty) 










34 


A h {I{ty) A S{ty,^e)}e{I{ty) A E:{ty,e^)} 
A h {l(ty) A S(ty,->'Si)}Si{l(ty) A S(ty,Sf)} 
A h {I(ty) A S(ty,->'S 2 )}S 2 {I(ty) A S(ty,S^)} 
A h {I(ty) A S(ty, ^e)}e{I(ty) A S(ty, e'^)} 

A h {l(ty) A S(ty, e->')}Si{I(ty) A S(ty, Sj;)} 
A h {I(ty) A S(ty, e'‘^)}S 2 {I(ty) A S(ty, S"^)} 

A h {I(ty) A S(ty,-^S)}e{I(ty) A S(ty,S-^)} 
A h {I(ty) A S(ty,->'S)}Si{I(ty) A S''^)} 

A h {I(ty) A "'(iy,'^S')}S' 2 {I(i 2 /) A S''^)} 


(CONS) 


(CONS) 


A h {T(ty) A ^(ty, '^5')}if e thenS”! else S 2 end{X(ty) A r; (iy, 5''^)} 


(COND) 


if S 


= eo.ni(ei,...,e„), :i{ty,e\) -A ^{ty,^ei+i) fori e ^{ty,^S) -A 
(^2/,'^eo), {I{ty) A ~{ty, ei)}vio.m{vii, -an){I{ty) A X{ty, S^)} S A then 


^(X, iy) = 

A h {I(ty) A S(ty,^e^)}e^{I(ty) A S(ty,ej)} for i G N„ 

A h {lity) A S{ty,^S)}e^{I{ty) A E:{ty,^S)[a^ := r]} fori G N, 
A h {I{ty) A S{ty, ^S)}no.m{ui, ..., u„){I(fy) A S{ty,S^)} 

A h {I{ty) A S{ty, '^5')}eo.m(ei,..., e„){I(iy) A S{ty, 5'^)} 


(COND + A) 

(METH) 


Lemma 2. For every dyn program tt and every consistent V Ex-typing ty of tt, 
ty) can be constructed and is a valid typing proof for ty in tt. 


Proof. By induction over the structure of the program tt comparing the appli¬ 
cation conditions of Hoare logic rules, the typing rules for Vex and the precon¬ 
ditions for the respective cases in the translation S. □ 


Note that this implies soundness of Vex- 


E Appendix: Omitted Proofs 

Proof for Theorem 1 

Proof. By definition of 0(cr), for all variables x of a base type T in a, 0{a) ^ 
safe-ffx.) holds and x can hence be safely mapped. Under the assumption that 
for all such variables x it holds that |x](0(cr)) = |x]((t), the following lemma 
can be established by induction over the structure of the assertion language: 
|Z](cr) = |0(1)](0 ((t)) for all logical expressions I and stat states a. As the 
assumption is guaranteed by the mapping predicates introduced by Tm, the 
desired result can then be established by induction over the structure of the 
assertion language. □ 

Proof for Theorem 2 

Proof. By induction over the structure of the proof (j), using Theorem 1 and the 
fact that the application conditions for the pure expression rules are satisfied 
when S' is a statically typed program and all assertions where translated using 
0 . 
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Proof for Lemma 1 

Proof, p' = p At has the described properties. 

Proof for Theorem 3 

Proof. Follows from Lemma ?? and the definition of the most precise typing 
assertion. 

Proof for Theorem 4 

Proof. Follows from completeness of the Hoare logic, Theorem 3 and the fact 
that type safety proofs must establish the absence of type errors. 

Proof for Lemma 2 

Proof. By comparing the application conditions for cases of S with the typing 
rules given in Figure 9 and the side conditions of applied Floare-logic rules. For 
instance, in the case for assignment, consistency of the typing implies t 2 /(|e]) = 
tydui := e]) and t 2 /(|e]) C tydui]), by the definitions for SSA it follows 
tj/(|el) = tj/(|u := e]) and t 2 /(|e]) C tydu]) and that u is the only variable whose 
(flow-sensitive) type may have changed between e'^ and S'^ (r stays the same). 
With the definition of S{ty,L) we conclude that S{ty,e^) O S''^)[u := 

r] A |u] G t for some type t. □ 

Proof of Theorem 5 

Proof. Let ty ^ ty' be a conjunctive refinement step and x G Vc.m- Then 
tj/'dx],L) = ty{lxl,L) n f2^(T) C ty{lxl,L). This difference is induced by the 
constraints generated from TZ^.. Since all other constraints are identical between 
TT and tt' and all constraints are monotonic, ty'{lS},L) C t 2 /dS'],L) for all sub¬ 
statements S' of TT and consequently ty' Q ty follows by induction over the 
constraint system. □ 

Proof for Theorem 6 

For our proof we need the following definition: 

Definition 11 (Fusion of Hoare Logic Proofs). Let (j> be a Hoare logic 
proof for {p}S{g} in some notion of correctness X and p be a typing proof 
for {r}S{T'}. Then, the fusion (p + p is a two-layered proof for {pAT}S{gAr'} 
in the sense of X-correctness. 

WLOG, we assume cp and (p to be minimal (all Hoare triples contribute to 
the proof’s conclusion). They hence have a tree-like structure. Their fusion can 
then be constructed by recursion over this structure. 

Induction basis = Fusing axioms. All axioms in our Hoare logic (Appendix B), 
are invariant under conjunction: if {pjSIg} and {T}S{r'} can be derived using 
this axiom, then {p A rlSIg A t'} can also. 

Induction step = Fusing rules. All rules in our Hoare logic have the following 
properties 
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— Invariant under fusion: If 
{Pl}-S'l{gi}, {Pn}Sn{qn} 

{p}S{q} 


(X) 


are valid rule applications, then 
{pi A Ti}S'i{gi A {Pn A Tn}Sn{qn A <} 


(X) 


{p A TjS'lg A t'} 

is also. 

They are either syntax-directed (and hence must appear in both proofs) or 
have a neutral application (x) that can be inserted into a proof 

to make its structure match the other one (having an application of rule X). 


For most rules, this is obvious. For applications of CONJ and DISJ, one needs 
to fuse the proof with both premises. To see that the properties hold for the 
SUBST rule, consider that all variables occurring in typing assertions are being 
read in some method of the program (otherwise, typing them is useless). Hence, 
the side-condition of the SUBST rule does not allow them to be substituted for 
and all applications of this rule hence are neutral for all typing assertions. 

Both proofs can hence be made structurally equivalent by inserting neutral 
rule applications and then fused using the invariance property. □ 



